Mining the Web with Active Hidden Markov Models

نویسندگان

  • Tobias Scheffer
  • Christian Decomain
  • Stefan Wrobel
چکیده

Given the enormous amounts of information available only in unstructured or semi-structured textual documents, tools for information extraction (IE) have become enormously important. IE tools identify the relevant information in such documents and convert it into a structured format such as a database or an XML document. While first IE algorithms were hand-crafted sets of rules, researchers soon turned to learning extraction rules from hand-labeled documents. Unfortunately, rule-based approaches sometimes fail to provide the necessary robustness against the inherent variability of document structure, which has led to the recent interest in the use of hidden Markov models (HMMs) [1] for this purpose. Speech recognition and computational biochemistry are well-known applications of HMMs. Markov model algorithms that are used for part-ofspeech tagging, as well as known hiddenMarkovmodels for information extraction [1] require the training documents to be labeled completely, i.e., each token is manually given an appropriate semantic label. Clearly, this is an expensive process. We therefore concentrate on the task of learning hidden Markov models, from partially (sparsely) labeled texts, and develop appropriate EM-style algorithms. By using additional unlabeled documents as they are usually readily available in most applications, we can perform active learning of HMMs. The idea of active learning algorithms is to identify unlabeled observations that would be most useful when labeled by the user. Such algorithms are known for classification, clustering, and regression; we present the first algorithm for active learning of hidden Markov models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Introducing Busy Customer Portfolio Using Hidden Markov Model

Due to the effective role of Markov models in customer relationship management (CRM), there is a lack of comprehensive literature review which contains all related literatures. In this paper the focus is on academic databases to find all the articles that had been published in 2011 and earlier. One hundred articles were identified and reviewed to find direct relevance for applying Markov models...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Data Mining for Web Personalization

In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and preprocessing, pattern discovery and evaluation, and finally applying the discovered knowledge in real-time to mediate between the user and the Web. This view of the personaliza...

متن کامل

Framework for Social Network Data Mining

Social networks have become a vital component in personal life. People are addicted to social network features, updating their profile page and collaborating virtually with other members have become daily routines. Social networks contain massive collection of data. Web data mining is a new trend in the current research body. This conceptual paper introduces a framework that can be used to mine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001